Assignment 2

Report for the second assignment of Effective MLOps: Model Development course.
Created on March 30|Last edited on April 2
Comment
Content
Baseline ResultsIncreasing the number of iterationsTuning the learning rateTuning the boosting typeHyperparameter sweeping (random)Retraining the best model for 300 iterations
﻿
Baseline Results﻿
﻿
F1 score
F1 score
020406080iteration0.150.20.250.30.350.40.45
baseline_model   training_macroF1
baseline_model   validation_macroF1
Max of validation_macroF1
Max of validation_macroF1
baseline_model
0.3803
﻿
Increasing the number of iterationsLooking at the evolution of the F1 score and multi logloss over the iterations, one can see that they did not reach a plateau yet. Hence, increasing the number of iterations would likely increase the performance of the model
﻿
﻿
As expected, the scores improve
Looking at the plot above, one can see that the F1 score on the validation data reaches a plateau as the number of iterations increases over 250
Tuning the learning rateThe chart below shows that the best learning rate value is the default one, namely 0.1
﻿
﻿
Tuning the boosting typeAgain, the default value ("gbdt") reaches the best results
﻿
﻿
Hyperparameter sweeping (random)15 runs, 150 iterations each
﻿
﻿
Retraining the best model for 300 iterations﻿
﻿
Conclusions
Unfortunately, after the sweep, the F1 score barely increased in comparison with the model with default parameters ran for 300 iterations (0.0005). Nevertheless, compared to the baseline model, the F1 score is 6% better
Exploring the visualisation from the last sweep, it seems that increasing the number of leaves and the L2 regularization parameter (reg_lambda) could improve the results
The default parameters of the model, especially the learning_rate and boosting_type, work remarkably well for the given data, as shown by the first two sweeps
﻿
Code available on GitHub﻿
💡
﻿
Add a comment